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Abstract: In interval censored models with current status observations, the 
variables are indicators of the presence of individuals on observation inter- 
vals and covariates. When several individuals share the same observation 
interval, a simple procedure provides new estimators for the distribution of 
the observation times and their intensity, in a closed form. They are n^^^- 
consistent for piece- wise constant covariates. Estimators of the sample-sizes 
are deduced and asymptotic tests for independence of the observations 
on consecutive intervals and for independence between consecutive classes 
for the observed individuals are proposed. 
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1. Introduction 



Statistical inference for sequential observations of individuals in a large popula- 
tion differs according to the nature of the samples. The observation of presence 
of individuals at specific locations is often restricted to a sequence of time inter- 
vals. In capture-recapture models, the size of finite and closed populations has 
been estimated under the assumptions of the same parametric model for the 
consecutive samples and time-dependent intensities for the transitions of the 
populations between several states, with individual covariates [ll[6j[7]- 
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The discrete observation sampling leads to cumulative observations on fixed 
or random intervals, it is an interval censored model with only current sta- 
tus observations. With individual observation times for all the individuals, the 
monotonic nonparametric maximum likelihood estimator of the time-dependent 
cumulative hazard function relies on the greatest convex minorant algorithm, it 
weighs the random observation times and converges at the rate n^/^ (see [2 13] 
and [1] in a model with constant covariates). Here a nonparametric Markov 
model with piece-wise constant covariate processes is considered as in [5] for 
continuous observations, and the observations are current status data with com- 
mon observation intervals. A simple reparametrization leads to easily calculated 
parametric estimators for the distribution functions of the observation times and 
the population sizes are estimated (section [3]) . The convergence rates of the es- 
timators in several nonparametric models is 'n}^'^. In section 01 models with 
dependent observations on consecutive time intervals are considered and new 
estimators and tests for independence are proposed. 

2. Models with independent observations 

Consider a population of L independent classes Ci , . . . , Cl of respective un- 
known sizes i>i, I — 1, . . . , L and v — vi + . . . + v^. In each class, a sample of 
the population is performed on a time interval [0, t] with random sampling sizes 
ni, I = 1, . . . , L and n. Let t;,i < . . . < ti^Ki < t be the end-point observation 
intervals for class C; and {Nu{t))t<T be the counting process of the observations 
of individual i of C; restricted to the intervals Iij^ =\Ti^k-i, Ti,fc], k = 1, . . . ,Ki 
up to time t, 

Ki 

Nuit) = X! ^i^,kHli,k n [0, t] 7^ 0}, with Su^k = Hi e Cl is observed on Ii^k}, 

k=l 

with Nii{T) < Ki, X^rii ^{Nh{t) > 0} = ni. Only cumulated numbers Nu{Ii^k 
are observed. 

An individual i of Ci is supposed to be characterized by a p-dimensional 
random covariate vector process Zn having left-continuous sample-pathcs with 
right-hand limits. The individuals are sampled independently and for Z = 1, . . . , L, 
the processes {Nu, Zu), i = 1, . . . ,ni, are mutually independent and identically 
distributed. The distribution of Nn conditionally on Zu is supposed to follow a 
Markov model with independent increments, where the probability of observing 
individuals only depends on their characteristics on the observation interval 

PriNuih)\{Zu{s))s<r,J = FriNuiIk)\Zu{Ii^k)), (1) 
only a countable set of values of the process Z appears in the whole sample-path 

The process Zu is sometimes restricted to a piece-wise constant process with 
values Zij on a random sub-partition I^^ ^ — [Uiij-i,Uiij[, j = 1,..., J of 
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{Il,k)l,k 

J 

z,,(i) = ^Zi,,i{te4^.}. (2) 

i=i 

The probability of observation of i G C; on the partitions {Ii^k)k is a discrete 
process defined according to the assumption ([T]) or ([2]). Let Tu^k be the unknown 
first presence time of i during the time interval Ii^k, and we suppose that the 
model is defined by 

Pi,k{Zii) = Pr(r/,fe_i < Tii^k < Ti^k\Zu) 

= H e 4j C Pr(C/H,,_i < Tu^k < Uu,j\Zu[Uu,j-i)). 
j 

PiiZij) Vr{Zu(Uu^j-i)^Zi^,), 

pi = Vr{Nu{Tui,)>Q) = j Vr{Nu{Ti,K,)>0\Zu{Ti^K^))dPi{Zii) 

= Pr{Nu{I'H.j) > 0\Zu{Uh,,-i) = ^;,,) Pi{Zi,,) 

Ki J 

fc=i j=i 

I -PI = Pr(7Vi,(r,,K,) -0). 

However individuals i with Nh{ti_Ki) = are not observed. An underlying time- 
continuous model is defined by the intensities of observation of the individuals. 
The conditional intensity of observation of class Ci is supposed to depend only 
on the current value of the covariate, for individual i in Ci and t in 7;^^, it is 
defined by 

A;,fe(<, z) = lim \ Vv{Nu{t + h)- Nu{t) > 0\Zi,{t) = z) 
/i-»o h 

More generally, the capture intensity for class I is defined as one of the intensity 
A/,fc by 

Ai(t, Zii) = lim \ VY{Nu{t + h)- Nu{t) > 0\Zu{t)) 

h^O IT, 



/™o^ ^ ^'^^^ ^ ^ ^i.j) under p. 



fc=i i=i 

The variation of the cumulative intensities on each sub-interval are denoted 

l\ki,k{t,Zu) ^ j \k{s,Zu{s))ds 
Jii.kn[Q.t] 



ViK, c//,fe} / Xi.kis,Zi^,)ds 
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under ([2]) and the cumulative intensities from is 

Ki k 
Ai{t,ZH) =Y,Ht& iLk} J2 ^^Lk'{t,z). 

k=l k' = l 

The unobserved apparition time Tu^k of i in Ci during the time interval Ii^k has 
a conditional distribution PrjTii^fc < t\Zii(Ti^k) = -^ij} 1 ~ Si{t,Zij), for a 
covariate value Zij. The probability of observation in Ci is continuously defined 
as 

Pi,kit,z) = Pr{Nu{t) - Nu{Ti,k-i) > 0\Zu{t) = z) = Si{Ttk-i,z) - Si{t,z) 

= exp{-AAi,fc(t, z)} -cxp{-AA;,fc(T;^fe_i,z)}, t e Ii^k, 
Pi{t, Zu) = Vx{Nu{t) > 0\Zu) = 1 - exp{-Ai{t, Zu{t))}, 

pi{t, Zu) is the distribution function of observation for an individual of Ci be- 
fore t conditionally on the covariate. For t in Ii^k, it is written pi{t, Zu) = 
J2k'<kPi.k'iZii) +pi,kit,Zii). 

In a discrete nonparametric model, the hazard function of individual i in 
Ci with covariate value Zij on an interval I^^ ^ is written J2k ■^i,k(tj G 

The proportional hazards model is defined by multiplicative intensities 

J 

i=i 

then 

AAi,k{t,Zu{t)) = Ve^''.-^'- / Xi{s)ds 
j=i Jii,kni[..n[o,t] 

J 

i=i 

Let S'i(i) = exp{— A;(s) ds}, for the vi individuals, then the probability of 
being unobserved is Vr{Tu > ti_Ki) = 1 — Pi{ti,Ki), where Tu the first presence 
time of «, 

i-piin^KuZu) = exp{-A;(Ti,K,,^ij)} = JJexp{-AAi,fc(Ti,fc,^H)} 

fc=i 

^ ^ SiiTi,k-i,Zu)^' 
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and the conditional observation probability of i on Ii^k is 



J 

cxp{ft'.fcZi,j} 



3. Identifiability and estimation of the parameters 
3.1. Model without covariates 

Without covariates the parameters are only the probabilities pi^k and Pi{ti xi)- 
Assuming that the observations on the different intervals are independent, the 
model is multinomial and the probabilities of independent observations on the 
Ki + 1 intervals are written with the differences Ai^k = 'AM,k{Ii,k) > 0, I < k < 
Ki, 

1~Pi{tuKi) = ^ Pi,k, 

k<Ki 

log(l-pz(T,.fc)) = ^{log5zKfe,_i) -log5,(T;,fc')} = - E (3) 

k'<k k'<k 

logp„i,fc = log{5;(T;,fc_i) - Siiji^k)] = log{l - exp(- E Al,k')}- 

k'<k 

The log-likelihood for class Ci is 

^"(0 = X! {^li,k^Ogpi^k + (1 - l0g(l 

j=l k<Ki 

under ^ and the MLE of the parameters pi^k and the function Si are 

rii ni Ki 

Pni,k = n^^'^Sii^k, Pni{Ti,Ki) = 1 - hi,k, 

i=l 1=1 fc=l 

rii k 

Snlin.k) = S'„i(r;^fc_i) -p„;^fc = 1 — rij ^ E E 

1=1 fc' = l 

The estimator 5„i is decreasing with weights at the sampling times Ti^k- From 
([31), the differences Ai^k satisfy 

A 1 "'^ ~ J2k'<kP^.k 

Al.k = log > 0, 

^ - l^k'KkPl.k 

their estimators are deduced from the Pni,k^ and the cumulative hazard function 
for Ci is estimated by 



K 1 _ ^ 

^ 2^k'< kPnl,k 

.Pnl,k 



Klit) = < t < r;,aiogi-^^ 

fc=l l^k'<kl 
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Let poi^k, Soi and Aq; be the actual values of the model parameters, then 

Proposition 3.1 The estimators Pni,k, Ani^k o.nd Sni are a.s. consistent as 
n ^ oo, ny^{pni,k —Poi,k)k converge to centered Gaussian variable with covari- 
ances nf^poi,k{l — PM,k) (^f^d zero otherwise, and the processes nj^^{Sni — Sqi) 

1/2 ^ 

and n/ (A„( — A^i) converge to centered Gaussian process with independent 

increments and variances 

niE{S„i - Soifin^k) = Poi,k'{^ - Poi,k'), 



k'<k 

niE{Ani - Aoif{Ti^k) = ^ Poi,k'0--Poi,k') ( Y>r(T, > t, f ^^ '' 

k'<k V r Tr u 



Pr{Tu > Ti,k-i)Pv{Tu > n^k) 

+POi,fe(l -Poi,k) 



PriTli > Tl,k) 



3.2. Models with covariates 



The parameters of the model are the probabilities pi and pi^k = Pi{Ii,k), or the 
functions pi{z) and pi,k{z) = pi{Ii^k,z) in regression model. The probabilities 
pi arc expressions of the pi^kS and of the distribution of the covariates, their 
estimators satisfy 

J 

Pnl,k = ^Pnl,kiZlj)pnliZlj), (5) 
Ki J 

Pi = ^^Pnl,k{^l,j)Pnl{Zlj) 
fe=l j=l 

but the distributions pi are not directly estimable since all the individuals arc 
not observed. Only the probabilities Pt:{Zu < z\5ii^k = 1) are directly estimable 
as the proportion of the individuals observed in Iik such that Zu < z. Then 
Pi{z) is deduced from the equation 

p , , EU P^iZii < z\Su,k = 1) PTi6u,k = 1) ^. ^ 

Pi(z) = — J ,V« = 1, ...,n (6) 

E/=iPr(fc,fe = l|^H<^) 
which is easily estimated with the empirical probabilities. 

The estimable parameters are always the values of the functions Si and A; 
at the observation times t;,^ and model parameters when it is appropriate. 
Conditionally on the covariates, the log-likelihood for class C; is 

^n(0 = ^^{Sli,ki0gpi^k{Zli) + {l-6H,k)i0g{l-pi^k{Zli))} 
j=l k<Ki 
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= E E El{^^,. ^ lLk}{SH,k\0gpiAZu) 
1=1 k<Ki j = l 

+ (l-'5^^,fc)log(l-p^,fc(Z^,,))}. 

The MLEs are identical to the previous estimators if the covariates are on the 
intervals Ii^k and pi^k{Zii) = pi,k- If J is finite, and the variations of the processes 
Zi^i are observed though those of Ni_i arc only observed on 7;^^, i = 1, . . . ,n, 
they are modified 

rii 



PnlM^l.j) = '^l ^^^^UmHIhj C Il^k}, 

1=1 

rii k 

Snl{TLk, Zlj) = 1 - ^ ^ ^ Sli,k'i{Iu^j C 

i=l fc' = l 

Sniin^k, z) = 1- n-^ Su^k' Y ^i^'J = ^ 



i=l fc'=l j=l 

K J 



A„i(i, z) 



k^lj=l l^k'<kPnl,k[Z) 



With continuous covariate and under ([T]), kernel estimators of the functions 
conditionally on z are defined with a kernel K, a bandwidth /i and Kh{x) = 
h^^K{h^^x), by smoothing these estimators or the previous ones 



Pnl.k{z) 



EtiKhiz- Zi^Ti^kM 



li.k 



EtiKUz-z,4v,k)) ' 

k 

Snl{ji,k,z) = 1 - Pnl,k'{z), 
k' = l 

J2iLl ^h{z - Zu^{Tl^k))5u,k 



lnl{t,z) = E' 



TJlUKuiz-Zuiri^k)) 

xx:i{^^/.4iog |~fi'<^g"''''['i 

^ i-Efc'<feP«i,fc(^) 

and they converge at the usual rate of the kernel estimators if the bandwidth 
tends to zero at the optimal rate for a p-dimensional covariate having 

a density with a s-order derivative. 

For estimation in the proportional hazards model with constant covariates 
Zii,k on Ji^fc, let ujii^k — exp{/3; fi; — {ijJii^k}i<nk<Ku 

log ASi{Ii,k) = logSi{TLk-i)+\og{l - /'^""'-'^ l 



imsart-ejs ver. 2007/09/18 file: ejs_2007_128.tex date: February 2, 2008 



O. Pons/Current status models 

J2 Ai,fc, -log(l-e-^'''=), 



k'<k 



log{l - piin^Kn'^li.k)) ^ - ^UJU,k^l,k, (7) 

k=l 

\0gPl^k{Zli) = CJH,fcl0gA5;(/;,fe) = -t^h:,fc{ ^ ^l,k' - log(l -e""^''")}. 

k'<<k 

Denote fn^k = log AS';(/;^fc) = \ogpi{Ii^k), then the estimator of puk{Zu^k) = 
exp{a;H,fe/ii,fc} of proposition 13 . II has to be restricted to the individuals with the 
same covariate value as Zu^k- 

Proposition 3.2 //il; is a finite set {w; then 

, Pi{Ii,k,Zuk) 

^i-j ^ log — -]-;—. — , 

PAluk) 

and estimators are defined by 



Pnlill,k, Zlj) 
V'nl,k 

UnlJ = log 



logp„i,fc = log{n;"^ ^ Sii^k}, 

i=l 



Snl{jl,k, Zlj) = 1 — 



J2i<ni J2k' = l l{^ii,fc = ^ijl^i^fc 



An estimator of Ai{Ti_k, Zi_j) is deduced from the Pni{Ii,k, ZijYs and ([3]) as 
previously, 

1 (i 7 \ S^^S^ ^ + <■ ^ T 1 ~ J2k'<kPnl{Il,k, Zlj) 

t~{ ^ - }^k'<kPnl(h,k,Zl^j) 

and the results of Proposition 13. II extend to these estimators. 

Let poi^k, Sqi and Ao; be the actual values of the model parameters, then 

Proposition 3.3 The estimators pni^k, Ani^k and Sni are a.s. consistent as 
n CO, n]^^{pni,k —pQi.k)k converge to centered Gaussian variable with covari- 
ances ri'^^Poi.ki^ ^ Poi.k) and zero otherwise, and the processes n\^'^{Sni — Sqi) 

1/2 ^ 

and n/ (A„; — Agz) converge to centered Gaussian process with independent 
increments and variances 

niE{Sni - SoiY{Ti^k) = Poi,fc'(l -Poi,k'), 

k'<k 
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niE(Anl - Aoif{Tl,k) = P0l,k'{l-P0l,k') ( p/^ . ^ ^TprCT -^-^ 1 ) 



k'<k 

2 



1 



The proportional hazards model without finite O; is still parametric but max- 
imum likelihood estimators are not written in closed form. Denoting ^uj = 
M{Uiij) — Ai{Uiij-i), the probabilities are now 

l0g(l ~pi{Ti,K,,Pk,l,Zlj)) 
Ki J 

= EE H^ij C Ii,k}eMPUZi,j}{^ogSi{Uuj) - logSi{Uuj-i)} 
fc=i j=i 

Ki J 

= -T.ll C Ii,k} eMP'i,kZij}Auj, 

k=lj=l 

J 

logpi,k{Zii) = J2l{Iij C Ii,k}eMfilkZi,j} log ASi{I'i,j) 

= E ^i^kj C hk} eMPlkZi,j}[logSi{Uuj.,) + log{l - 
J 

+ log{l - exp(- exp{/3; feZ;,, }A,,,,)}]. 

When covariate only depend on the observation intervals, the parameters are all 
identifiable by maximization of the likelihood, as it is the case with continuously 
observed individuals. The parameters are not identifiable when the covariates 
vary individually. 



3.3. Estimation of the sample size 



The unknown population size v has to be estimated. For a population of L 
observed classes Ci, . . . , d, of respective sizes ui, estimators of the catching or 
observation probabilities pi^k would be ni.kh'f^ if i^i was known, k = 1, . . . , Ki. 
By inverting this expression after an estimator pni has been defined, the sizes 
are usually estimated by 

1 T 

I'nl — ~ — ) ' — J-j • • • ) '^n 

Pnl 

With consecutive intervals under the same conditions and with varying catching 
or observation probabilities pi^k, define a moving average estimator of pi^k and 



= E^"' = E 



1=1 



1=1 



Pnl ' 
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mean estimators of classes and population sizes for fc > a > 1 by 
k'=k-aPni,k' ^ sr^ ni^k ^ 

Pnl.k — 7{ J '^nl — > , — , '^n ~ / ^ >^nl- 

2a f-" Pnl,k f-' 

k>a l—l 

The same method applies for covariate dependent probabilities, using the esti- 
mators of section 1321 and ([S])-®. 



4. Models with dependent observations on consecutive intervals 
4.1. Nonparametric models 

When the probability of observing individuals in Ii^k depends on their observa- 
tion in several nonparametric models may be considered. Let 

n,k = Pr{T;^fc_l < Tli < Tl^k+l\TLk-l < Tli < T/^fc}, 
T^l,k{Zli) = ^T:{Tl,k-l <Tli < Tl^k+l\n,k-l < Tli < Tl^ki Zli), 

then 

PLk,k+i = Pr{r;^fc_i < Tli < n^k+i} = m,kPi,k 

and conditionally on Zu, pi,k,k+i{Zu) = 'n:i^k{Zii)pi,k{Zu). The estimators are 
now defined for joint intervals, 



^li,kSliM+l 



'^nl,k — v~>"i JC ' 

Pnl,k,k+1 — ^ 5li,k5li,k+l, 

4=1 

"1 



Pnl,k,k+l{Zlj) — ^ '^H,fc'^H,fc+ll{-^/i,j C /;^fc U/;,A;+l}; 



i=l 



T^nl,k{Zlj) — 



J2iLl ^li,kSli,k+l^{Ilij C iLk ^ IlM+l} 

YJiLi SumHIh^ c ii^k} 



All the other models and estimators of section 14.11 are generalized by the same 
method. In the model without covariates, a test for the hypothesis Hq of inde- 
pendence between intervals Ii,k and Ii.k+i is a test for pi^k,k+i = Pi,kPi.k+i or 

TTi.fc = Pl,k+1- 

Proposition 4.1 Under Hq, the statistic 

K,-l 



iPnl,kPnl,k+l — Pnl,k,k+iy 



PnLkPnl,k+l 



converges to a X{Ki-2y^ '^^ ^ 

imsart-ejs ver. 2007/09/18 file: ejs_2007_128.tex date: February 2, 2008 



O. Pons/Current status models 



11 



Proof. Let Ni^k = Eti Nuili.k), Ni,k.k+i = E^^i Nuili.k U Ii,k+i) and 
^ _ "y^^ (iV;,fc,fc+i - n^'Ni^kNi^k+i? 

is the test statistic for independent marginals in a two-dimensional array. 



4-2. Markov models 

As the individual classes change during the observation period, a second class 
index may be incorporated in the model to take into account the evolution. Let 
Ci^Ti denote the class at for some observation time Tj of individual i, 

= ^{Ci.Ti — Cl,C^j,- — Cl'}, 

Pi\i',k = Pi\i'ihk) = Pr{r. e Ii,k, C,,T. = QIQt- - Ci,}, 

Si\i',k = MT^ e Ii,k,T, > t, a,T. = CilC^ j,- = 

^i\i',k = h-^limFT{T,e[t,t + h),Q,T.=Ci\T,>t,C^^^ =Ci,}, 

The likelihood is proportional to 

L Ki n L 



1=1 k=l i=l i' = l 

and the estimators become 

Pnl\l' .k x^ni 5 

i=l = l Ou,k'Vll',i 
^nl\l'[Tl,k) — r -^^i , 

K J . ^ ^ 

V^Y^ir ^ + ^ 11 2^k'<kPnl\l',k 

^^l{ri,k-i<t<v,,}log^— ^— -. 

7 = 1 l^k'<kPnl\l' ,k 



Ki\v{t) 



The extension to models and estimators with covariates follows easily from sec- 
tion [3?2l A test for the hypothesis Hq of independence between observation and 
the variation between classes is a test for pi\ii ,k — P/,fc PrjCi^Ti = Ci\C^j,- = 
Ci' } for every 1,1' ~ 1, . . . , L and k ~ 1, . . . , Ki. 

Let qw — PrjCi^T; — Ci, C- j,- — C/'}, then the estimators 

InlV = , PnW ,k = Pnlll'k QnW , 

ni 

provide a test statistic. 
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Proposition 4.2 Under Hq, the statistic 



Ki L 



{Pnll'.k — PnlM^nwY 
Pnl,k Qnll' 



fe=l 1=1 

converges to a xfKi-i){L-i) ~* 
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